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Abstract Optimization problems with constraints in the form of a partial 

differential equation arise frequently in the process of engineering design. The 
discretization of PDE-constrained optimization problems results in large-scale 
linear systems of saddle-point type. In this paper we propose and develop a 
novel approach to solving such systems by exploiting so-called quasiseparable 
matrices. One may think of a usual quasiseparable matrix as of a discrete 
analog of the Green's function of a one-dimensional differential operator. Nice 
feature of such matrices is that almost every algorithm which employs them 
has linear complexity. We extend the application of quasiseparable matrices 
to problems in higher dimensions. Namely, we construct a class of precon- 
ditioners which can be computed and applied at a linear computational cost. 
Their use with appropriate Krylov methods leads to algorithms of nearly linear 
complexity. 
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1 Introduction 

In this paper we introduce a new class of structured matrices that we sug- 
gest to call multilevel quasiseparable. It generalizes well-known quasiseparable 
matrices to the multi-dimensional case. Under the multilevel paradigm, param- 
eters that are used to represent a matrix of a higher hierarchy are themselves 
multilevel quasiseparable of a lower hierarchy. The usual one-dimensional qua- 
siseparable matrix is the one of the lowest hierarchy. 
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Quasiseparable matrices found their application in orthogonal polynomials 
[1], root finding [4], system theory [2] and other branches of applied mathe- 
matics and engineering. Consequently, there has been major interest in them 
in the scientific community, and many fast algorithms for working with them 
have been developed in the last decade. Due to the similarity in represen- 
tations of multilevel quasiseparable and usual quasiseparable matrices, it is 
possible to extend the applicability of almost all the algorithms developed for 
quasiseparable matrices to the new class. It is a celebrated fact that operations 
with quasiseparable matrices can be performed in linear time with respect to 
their size. We show that under some conditions this is also true for multilevel 
quasiseparable matrices. 

Natural applications of multilevel quasiseparable matrices are discretized 
Partial Differential Equations (PDEs) and related problems. There have al- 
ready been recent attempts to use structured matrices similar to quasisepa- 
rable ones to solve discretized PDEs p51ll9) . The very distinct feature of our 
approach is the ability not only to solve systems in linear time, but also to 
invert matrices and to perform arithmetic operations with them in linear time. 
Therefore, we can solve, for instance, saddle-point systems with PDE matrices 
that former approaches did not allow. 

The structure of the paper is as follows: we start with a motivation for 
this paper by considering a simple PDE-constrained optimization problem in 
Section [2] This type of problems leads to systems in a saddle-point form. The 
need in efficient solvers for such systems motivated this paper. In Section [3] we 
briefly review quasiseparable matrices and related results. Section |4] concerns 
the extension of quasiseparable matrices to the multi-dimensional case. We 
give there a formal definition of multilevel quasiseparable matrices, introduce 
their parametrization and fast arithmetic with them. The last section presents 
results of preliminary numerical experiments that empirically confirm linear 
time complexity of algorithms with multilevel quasiseparable matrices. 



2 Model PDE-constrained optimization problem. 

As a simple model let us consider a distributed control problem which is com- 
posed of a cost functional 

mmhu-u\\l + mf (1) 

to be minimized subject to a Poisson's problem with Dirichlet boundary con- 
ditions: 

-^'" = -^^!i^^' (2) 
u = g on Oi2. ^ ' 

The problem consists in finding a function u that satisfies the PDE con- 
straint and is as close as possible in the L2 norm sense to a given function u 
("desired state"). The right-hand side of the PDE / is not fixed and can be 
designed in order to achieve the optimality of u. In general such a problem 
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is ill-posed and thus needs Tikhonov regularization that is introduced in the 
form of the second term in (jlj. 

The usual approach to solving the minimization problem ([T]) - ([2| numer- 
ically is to introduce a weak formulation of it and to discretize the latter by 
finite elements, see, for instance, PTIIT^ . After these steps, the discrete analog 
of ([l]) - ([2]) reads 



min l\\uh 



(3) 



subject to 

where Vq and are finite-dimensional vector spaces spanned by test func- 
tions. If . . . (/)„} and {01, . . . (/)„, 0„+i, . . . 4)n+dn} are bases in Vq and V^, 
respectively, and Uh and vt are represented in them as 



n-\-dn 



then the matrix form of ([s]) is 



min - Vl^ Mvl — U"^b - 
u,f 2 

subject to A'u = Mi 



c + /3f ' Aff , 
hd, 



where Mij — J^^ (t>i4'j 
b 



mass matrix, Kij = J^^ V^iV^j 

n-\-dn 
k=n+l 

The Lagrangian associated with problem Q is 



(4) 



stiffness matrix, 



5 In 



1 



"2/3M -M' 




"f" 




0' 


M 




u 




b 


-M K 




A 




d 



/:(u, f , A) = -u^Mu -VL^b + c + iSt'^Mf + X^{Ku -Mi- d), 

where A is a vector of Lagrange multipliers. The condition for a stationary 
point of C define u, f and A via the solution of the linear system 



(5) 



This linear system is of saddle-point type. We refer to j3j for an extensive 
survey of numerical methods for this type of systems. 

Matrix in ([5]) is symmetric, usually very large but sparse due to finite ele- 
ment discretization. Thus matrix-vector multiplication is cheap and Krylov 
subspace methods such as Minimal Residual (MINRES) method of Paige 
and Saunders [l^ or Projected Preconditioned Conjugate Gradient (PPCG) 
method of Gould, Hribar and Nocedal [15] are well suited. Their convergence 
nevertheless depends highly on the choice of a good preconditioner [21) . 
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In this paper we propose a structured matrices approach to the solution 
of systems of the type described above. First, let us write the block LDU 
decomposition of ([s]): 



2I3M -M 

M 
-M K 



/ 
/ 

' 1 



5* = - 



2/3 



-M 



2PM 
M Q 
OS* 



■^0 ■ 

/ M-^K^ 
/ 



(6) 



The hardest part in using this decomposition for solving a system is to compute 
the Schur complement S of the (3,3) block and to solve the corresponding 
system with it. Since matrix S is usually dense, this task is untractable if we 
use its entrywise representation but it is feasible if we use a proper structured 
representation. In Section [4] we will show that Schur complement indeed has a 
structure and this structure is the one called multilevel quasiseparable. We will 
give all necessary definitions and show that the use of multilevel quasiseparable 
matrices leads to asymptotically (with respect to the size of the system) linear 
storage and linear complexity algorithm for solving ([5]). 



3 Quasiseparable matrices. 

Matrix of a rank much smaller than its size is a discrete analog of a separable 
function. More generally, we may think of a matrix with certain blocks being 
low rank rather than the whole matrix. This case corresponds to a function 
being separable in some subdomains of the whole domain. One simple example 
of such a function is Green's function of the Sturm-Liouvillc equation (this 
example will be considered in some more details later in this section). There 
is no surprise that matrices with low rank blocks found their applications in 
many branches of applied mathematics, especially in integral equations. There 
are several related classes of rank-structured matrices that one can find in the 
scientific literature. Among the most well-known are semiseparable [Hj, quasi- 
separable [9., 77-matrices [16], ff ^-matrices [17] and mosaic-skeleton matrices 
[23] . In the current paper our main tool will be quasiseparable matrices [24] 
having low rank partitions in their upper and lower parts. The formal defini- 
tion is given below. 

Definition 1 (Rank definition of a block quasiseparable matrix) Let 

A be a block matrix of block sizes {rikY^^i then it is called block (r',r")- 
quasiseparable if 

m^x T8inkA{K + 1 : N,l : K) < r\ max rank^(l : K,K + 1 : N) < r", 

k n 

K ^^Ui, N = ^ni, 

i=l 1=1 

where (r") is called lower (upper) order of quasiseparability. 
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In what follows we will call block {r\r^)- quasiseparablc matrices simply 
quasiseparablc for shortness. 

A well-known example of a quasiseparable matrix is given by the discretized 
Green's function of a one-dimensional differential equation. Consider a regular 
inhomogeneous Sturm-Liouville boundary value problem: 



(p{x)u')' ~ q{x)u = /(x) 

J aiu{a) -\- /3iw(a) = 0, 
I a2u(6) + /32m(6) = 0, 



(7) 



where functions p, p' , q are continuous on [a, b], p{x) > on [a, b] and \ai \ + 
|/3i| ^ for i = 1, 2. It is a classical result in ordinary differential equations 
that the solution f{x) of ([T]) is given by 



u{x) 



where g{x,(,) is the so-called Green's function. In this case it has an explicit 
formula 

1 ^ \ ui{x)u2{0, a<x<^, 
p{a)W{a) ^ \ui{0u2{x), ^ < x < b, 



(8) 



with W{x) being Wronskian and ui{x) and U2{x) being two specific linearly 
independent solutions of ([t]). It is easy to see from Q that discretized Green's 
function is a quasiseparable matrix of order one. 

In order to exploit the quasiseparability of matrices in practice one must 
have a low-parametric representation of them. There are many alternative 
parametrisations of quasiseparable matrices all of which use 0{N) parameters, 
where N is the size of the matrix. Having such a parametrisation at hand one 
can write most of the algorithms, e.g., inversion, LU or QR decomposition, 
matrix-vector multiplication in terms of these parameters and, therefore, the 
complexity of these algorithms is 0{N). In this paper we will use the so- 
called generator representation (Definition[2]below) proposed by Eidelman and 
Gohberg [9]. There are several alternative parametrisations such as Givens- 
weight and Hierarchical Semiseparable (HSS) [6 . 

Definition 2 (Generator definition of a block quasiseparable matrix) 

Let A be a block matrix of block sizes {nk}]!.^^ then it is called block (r', r")- 
quasiseparable if it can be represented in the form 



di gih2 

P29l d2 

P3.a2qi P3q2 



gihhs 
da 



Pn^n 



■ 0251 P„a„_i . . . 03^2 Pnan 



5162 ■■■b, 
.92^3 ■■■b; 
gsbA . . . bn-ih 



ih„ 



(9) 
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Table 1 Sizes of generators of a block quasiscparable matrix. 





dk 


Qk ife 


Pk 


9fc 


bk 


hk 


# of rows 






■n-k 






'fc-1 


# of cols 


nk 


nk r-^_i 




' k 


' k 


k 



where parameters (called generators) {dfc, q^, Ofc, Pfe, fffc, fcfc, /ifc} are matrices 
of sizes as in the table below. 

Orders of quasiseparability r' and r" are maxima over the corresponding 
sizes of generators: 

— max ri, r" = max 

l<fe<n-l l<fc<n-l 

Remark 1 Generators are not unique, there are infinitely many ways to repre- 
sent the same quasiseparable matrix with different generators. For the relation 
between different sets of generators see [TT] . 

Remark 2 It is obvious that any scalar quasiseparable matrix can be converted 
to the block form by simple aggregation of its generators. 

Remark 3 Sizes r\. and of generators are directly related to ranks of sub- 
matrices in the lower and upper parts, respectively. Namely, if we let K to be 

the block index: K ^ then 

i=l 

TmkA{K + l:N,l:K)<r{, TimkA{l : K,K + 1 : N) <rl. (10) 
Moreover, for any quasiseparable matrix there exists a set of generators with 



sizes rj. and that satisfy (10) with exact equalities (such generators are 



called minimal). For their existence and construction see |llj . 

One remarkable property of quasiseparable matrices is that this class is 
closed under inversion [9tl22j. For instance, inverse of a banded matrix is not 
banded but both matrices are quasiseparable. Due to the low-parametric rep- 
resentation in Definition [2] most of the algorithms with quasiseparable matrices 
have linear complexities. Table [2] lists these algorithms along with the corre- 
sponding references. 



Table 2 0{n) algorithms for quasiseparable matrices 

A*v LU QR A-i A*B A±B 



Theorem 



[T] [IQ] [9] [1 - [U 



The ability to solve systems with quasiseparable matrices in 0{n) oper- 
ations is essential for the purpose of this paper. One of the ways to do it is 
through the use of fast LU decomposition. We next derive LU decomposition 
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algorithm for a general block quasiseparable matrix in terms of the generators 
it is represented with. A restricted version of this algorithm applicable to diag- 
onal plus semiseparable matrices (a subclass of quasiseparable matrices) was 
derived in [T3] and some formulae of the current algorithm are similar to the 
ones in the inversion algorithm given in H]. Still, to the best of our knowledge, 
current LU decomposition algorithm has never been published and may be 
useful to those who need a fast system solver for quasiseparable matrices. The 
complexity of the algorithm is 0{N) and it is valid in the strongly regular case 
(i.e. all of the principal leading minors are non-vanishing). First, let us note 
that quasiseparable structure of the original matrix implies the quasiseparable 
structure of L and U factors. The theorem below makes this statement precise 
and, in addition, relates generators of an original matrix to generators of the 
factors. 



Theorem 1 Let A he a strongly regular N x N block {r' ,r'^)-quasiseparable 
matrix given by generators {dk, qky^k^Pk, 9k,bk, h^} as in ^. Let A^LU be 
its block L U decomposition of the same block sizes. Then 

(i) Factors L and U are (r', 0)- and (0, r'^)- quasiseparable. Moreover, r\,{L) = 
rl{A) and r^(C/) = rl{A) for all k = 1, . . . ,n - 1. 

(a) L and U are parametrized by the generators {Ik, qk,ak,Pk, 0, 0, 0} and {dk, 
0, 0, 0, gk, bk, hk}, where Ik are identity matrices of sizes Uk x Uk and new 
parameters qk, dk and gk can be computed using the following algorithm: 



Algorithm 1 Fast quasiseparable LU decomposition. 

Input: dk,qk,ak,Pk,gk2bk,hk 
l:di = di, qi=qid^^, gi = gi, = q^g^ 

2: for k = 2 to n — 1 do 
3: dk = dk - Pkfk-ihk 
4: £fc = {qk - o.kfk-lhk)d^^ 
5: gk = 9k - Pkfk-lh^ 
6: fk = a-kfk-lbk + qkdk 
7: end for 

8: dn — dn Pn fn — l^n • 

Output: dk,qk,gk 



Proof Statement (i) of the theorem follows from statement (ii), so we only 
need to prove the latter part. 

k 

Denote, as usual, by K the block index: K = "-i ^'^^'^ note that quasi- 

i=l 

separable representation p| implies the following recurrence relation between 
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the blocks of A: 
A 



A{1 ■.K,1:K) GkHk+i 

Pk+iQk \A{K + 1:N,K+1:N)_ 

Qi^qi, Qk = [akQk-i Qk], fc = 2, ...n-l; 

Pn=Pn, Pk = [pk ; Pk+io-k], fc = n-l,...2; 

Gi^gi, Gk = [Gk-ibk ; gk], /c = 2,...n-l; 

Hn = hn, Hk = [hk bkHk+i], k = n - 1, . . .2. 



(11) 



The proof will be constructed by induction. We will show that for each k 

(12) 



Ak 
^11 


GkHk+i 




r 

^11 


Q' 




■ Tjk 


GkHk+1 


Pk+iQk 


-k 




Pk+iQk 












For fc = 1 we get from ( 12 1: 

P2Q1 =P2Qldi, 

G1H2 = G1H2, 



di = di, 
qi = qid^'^, 



9i = 5i- 

Let us introduce an auxiliary parameter fk = QkGk- It is easy to show by 



using (11) that fk satisfies the recurrence relation 

/i = fk = cikfk-ibk + qkljk- 



Assume that (12 1 holds for some fixed k, then it holds for fc + 1 if 

dk+i ~ bfe+iQfc 1] • [Gfe/ifc+i ; dfc+i]: 

Pk+2<lk+l — Pk+lQk+l • [Gkhk + 1 ; rffe+l]; 
gk+lHk+2 = [Pk+lQk 1] ■ Gk+lHk+2- 

The first equality simplifies 

dk+i = Pk+iQkGkhk+i + dk+i = Pk+ifkhk+i + rffe+i. 



(13) 

(14) 
(15) 
(16) 



The second equality ( 15 1 holds if 

qk+i = Qk+i ■ [Gkhk+i ; dk+i] — 

= [a.k+iQk qk+i] ■ [Gkhk+i ; dk+i] — ak+ifkhk+i + qk+idk+i- 



Finally, the last equality ( 16 1 is true if 

9k+i = [Pk+iQk l]-Gfc+i = [pk+iQk ^-IGkbk+i ; gk+i] = Pk+ifkbk+i+Hk+i- 

Matrix dk+i is invertible because, by our assumption, matrix A is strongly 
regular. Hence, we conclude that ( 12 ) is true also for index fc + 1 if generators 
Qfe+i, dk+i and 'gk+i are those computed in Algorithm [l] The assertion of the 
theorem holds by induction. □ 
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4 Multilevel quasiseparable matrices. 



In the previous section we have emphasized the relation between second-order 
ordinary differential equations and quasiseparable matrices. Let us now look 
at the problem in two dimensions that extends ([7|: 



d_ 
dx 



d_ 
dy 



dy 



r{x,y)u{x,y) = .f{x,y) 



for {x,y) e i7, where SI — [0, 1] x [0, 1] with homogeneous Dirichlet boundary 
conditions. The standard five-point or nine-point discretization of this problem 
on a n X m uniform grid leads to a system of linear algebraic equations of the 
form 



Ci A2 B2 



Bm- 

^1 A, 





Ml 




7r 








/2 








_fm_ 



(17) 



where we have assumed that Ui are the discretized unknowns along the i'th 
column of the nxm grid. In this case each of Ai, Bi, and C'i is an n x n matrix 
and the whole block matrix has size nm x nm. Furthermore, each of the Ai, 
Bi, and Ci is a tridiagonal matrix. 



One way to solve system (17) is to do block LU factorization assuming 
that it exists. When we eliminate the block entry Ci we get in the position 
occupied by A2 the new block 

82^ A2-C\A^^Bi. 

Observe that even though all the individual matrices on the right-hand side 
of the expression are tridiagonal, A^^ is not, and hence 5*2 is a dense (non- 
sparse) matrix. At the next step of Gaussian elimination we would use Si as the 
pivot block to eliminate C2. Now in the position occupied by the block A^ we 
would get the matrix S3 = A^ — C2iS'^^-B2- Again, since S2 is a dense matrix, 
in general will be dense, and therefore S3 will also be a dense matrix. 
What this implies is that during LU factorization of the sparse matrix, we 
will produce fill-in quickly that causes us to compute inverses (and hence LU 
factorizations) of dense nxn matrices. If we assume that these dense matrices 
have no structure, then we would need 0{n'^) flops for that operation alone. 
Therefore, it follows that one would require at least 0{mn^) flops to compute 
block LU factorization of the system matrix. 

At flrst glance it seems that block LU factorization is not a wise approach 
as it does not use any kind of fill-in minimisation reordering. However, there is 
a rich structure in successive Schur complements 5*^ that can be exploited to 
speed-up computations. In fact, it has been conjectured that if one looks at the 
off-diagonal blocks of these matrices {S2, S3, etc.), then their e-rank (number 
of singular values not greater than e) is going to be small. This conjecture has 
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Fig. 1 Growth of £-rank (quasiseparable order) with size for different e. 



been justified by the fact that, for example, Sk can be viewed approximately 
(especially in the limit as n becomes large) as a sub-block of the discretized 
Green's function of the original PDE. It is known from the theory of elliptic 
PDEs (see, for instance, [13]) that under some mild constraints the Green's 
function is smooth away from the diagonal singularity. This, in turn, implies 
that the numerical ranks of the off-diagonal blocks of Si are small. There have 
been several recent attempts to quantify this observation. It has been proved 
in [5] that e-rank of the off-diagonal blocks of Schur complements in the LU 
decomposition of the regular 2D-Laplacian are bounded by 



1 



!ln^ 



This bound is not effective for any reasonable size of the problem because of 
the 4'th power in the logarithm (for instance for e =1.0e-6 it gives 6.2e-|-5). 
However, numerical experiments with Laplacian discretized by the usual five- 
point stencil suggest much lower bound, see Figure [l] 

Careful exploitation of the low-rank structure of Schur complements re- 
duces the complexity of block LU factorization algorithm to linear 0{mn). 
Indeed, as it has been mentioned in the previous section, all the operations 
with quasiseparable matrices that are needed at each step of the algorithm 
have 0{n) complexity and there are m steps altogether. First algorithms of 
this type were recently developed in [25 l fT9 ] . These algorithms, although ef- 
ficient in the case of simple PDEs, are not applicable to PDE-constrained 
optimization problems, such as the one described in Section [2] Saddle-point 
matrices arising there are dense block matrices with blocks themselves being 
discretized PDE matrices. To use block LU factorization in this case one would 



need to invert PDE matrices of type (17 1. We next show that it is possible by 



defining the right structured representation of inverses of PDE matrices. 
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Definition 3 (Multilevel quasiseparable matrix) Matrix A is called d- 
level quasiseparable if it admits the representation 



P2Q1 



G1H2 
P3Q2 



G1B2H3 
G2H3 
D3 



PnAn-l ■ ■ ■ A2Q1 PnAn-1 ■ ■ ■ A3Q2 PnAj. 



■A4Q3 



G1B2 ■ ■ ■ Bn-lHn 
G2B3 . . . Bn~lHn 
G3-B4 • • ■ Bn-lHn 



where each of the parameters {Qk, A^, Pk, Dk, Gk, Bk, Hk} is a block matrix 
of blocks being (d — l)-level quasiseparable matrices. 1-level quasiseparable 
matrix is a usual quasiseparable matrix, see Definition [T] 

Let us give some simple examples of multilevel quasiseparable matrices to 
justify the usefulness of the proposed structure. 

Example 1 (Block banded matrices with handed blocks) Any banded matrix is 
automatically quasiseparable. Similarly, block banded matrix is 2-level quasi- 
separable if blocks are themselves banded matrices. As an example, consider 
2D Laplacian on the square discretized by Ql finite elements; 



K = 



Example 2 (Tensor product oj quasiseparable matrices) Matrix A®B is 2-level 
quasiseparable if A and B are 1-level quasiseparable. Hence, 2D mass matrix 
M on the square discretized by Ql finite elements: 

■4 1 

1 ■•. ■■. 



'A B 




"-8 1 




"1 1 




B ■■ ■■ 




1 '■. '■. 




1 '•. ■•• 


• (18) 






'■■ ■■ B 




■■. ■■. 1 




'■. ■■. 1 




B A_ 




1 -8 




1 1_ 





M 



) T, T = - 
6 



■. 1 

1 4 



(19) 



is 2-level quasiseparable. Its inverse is also 2-level quasiseparable due to 
the tensor identity {A (E) B)^^ ~ A^^ (E) B^^ and properties of 1-level quasise- 
parable matrices 



Quasiseparable orders of multilevel quasiseparable matrices may differ at 
each level. For instance, it follows immediately from the properties of quasise- 
parable matrices that inverse of a 2-level quasiseparable matrix has a represen- 
tation as in Definition [3] for some generators {Qk, Ak, Pk, Dk, Gk, Bk, Hk}. 
However, there is no guarantee that entries of this generators are quasisepa- 
rable matrices of the same order as for the original matrix. 2D Laplace matrix 
in Example [T] is 2-level quasiseparable with generators of quasiseparable order 
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one but as Figure [T] illustrates, generators of LU factors (and, therefore, in- 
verse) have orders larger than one. However, if it is true that numerical orders 
(ranks) do not differ much from the original orders we may say that inverses 
of multilevel quasiseparable matrices retain the structure approximately. This 
observation, in its turn, leads to algorithms of linear complexity. 

To compute LU factorization of a 2-level quasiseparable matrix we may 
use Algorithm [T] It consists of approximately 12n arithmetic operations with 
1-level quasiseparable matrices. Sum or difference of rank-ri and rank-r2 ma- 
trices may have rank ri+r2- Similarly, any arithmetic operation (except inver- 
sion) performed on quasiseparable matrices may add their orders in the worst 
case. Therefore, to use multilevel quasiseparable structure efficiently we need 
some kind of order compression algorithm. Such algorithm was proposed in 
[TT] . we list it below for the convenience of the reader. This algorithm consti- 
tutes the essential part of the proposed method. 



Algorithm 2 Quasiseparable order reduction. 

Input: qfc, a^, of sizes X n^, rj. X rj._i and n^. X rj._i, respectively. 
1: Using LQ or SVD decomposition determine matrices Li and q'^ of sizes ri X r[ and 

rj X ni, respectively, such that qi = Lit/j, where q'liq'i)* = I^i^ and = rankqi. 
2: for A; = 2 to 71 — 1 do 

3: Using LQ or SVD decomposition determine matrices and Vfe of sizes rj. X and 
^'k ^ i^'k-i + ^k), respectively, such that [atLk-i qk] = LkVk, where V^(Vfe)* = -^rj. 
and r'f^ = rank[afcLfe_i qk]. 

4: Split Vk into the new generators a'^., 5^ of sizes rj, X r'f,_-^, r'^ X n^, respectively: 

Vk = K <i'k\- 

5: v'k = PkLk-i 
6: end for 

7: p'n = PnLn-1 

8: Using QR or SVD decomposition determine matrices pj^ and Sn-i of sizes n„ x r^ j 
and r'^_i X r'^_-^, respectively, such that pjj = p'^Sn-i, where {Pn)*Pn = 1^" ^ and 

= rankp^,. 
9: for A; = n - 1 to 2 do 

10: Using QR or SVD decomposition determine matrices Uk and Sk—i of sizes (uk + 
r^') X 7"^'_j and r'^_-^ X respectively, such that [pj. ; S^aJ.] = UkSk-i, where 

{UkrUk = Ir-_^ and r^'_i = rankp'^. 

11: Split Uk into the new generators p'^, a'^ of sizes Uk x ?'^'_]^, r'^ x t'^_i, respectively: 

Uk = [P^; ; <']. 

12: q'l = Skq', 
13: end for 

14: q'^ = 5igi 

Output: New generators g^f, a'^, p'^ of minimal possible sizes r^' X Uk, f^' x r'i^_i and 
Uk X t'^_-^, respectively. 



In exact arithmetic, new sizes r'^ of generators match ranks of the corre- 
sponding submatrices in the quasiseparable matrix (see for the proof). In 
floating point arithmetic we may use truncated SVD or rank-revealing LQ/QR 
to decrease sizes of generators to the value of the e-rank of the corresponding 
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submatrix. When e-ranks are small, complexity of Algorithm [2] is 0{n) and, 
hence, the following fact takes place. 

Complexity of algorithms listed in Table with 2-level quasiseparable ma- 
trices (Definition^ is linear in the size if quasiseparable orders of 1-level 
quasiseparable matrices stay small during computations. 

In the next section we will demonstrate practical properties of multi- 
level quasiseparable matrices approach applied to solving PDEs and PDE- 
constrained optimization problems. 



5 Numerical results. 

We illustrate our new method with two different examples: 2D Laplace's equa- 
tion with Dirichlet boundary conditions and distributed control problem with 
a constraint in the form of 2D Poisson's equation. We show that the proposed 
method can be used either directly or as a preconditioner for an iterative 
method depending on the level of order reduction used in Algorithm [2] Al- 
though we consider symmetric problems only, our approach does not require 
symmetry and only depends on low rank properties of operators. Therefore, 
it can also be applied to convection-diffusion and Helmholtz problems. We do 
not consider 3D problems as it is not clear yet how to adapt SVD and LQ/QR 
decomposition used in Algorithm [2] to the block case. 

Our software is written in C+-|- and is compiled with GCC compiler v. 4. 2.1. 



It is freely available for download at http: //www. maths. ed.ac.uk/ERGO/ 



software.html. All tests were done on a 2.4 GHz Intel Core 2 Duo with 
4 Gb of RAM. 

Example 3 Let Q = [0, 1]^ and consider the problem 

- 0' - (20) 
'U' = u\q[2, on dS2, 

where 

sin(2TTy) if a; = 0, 
-sin{2'!iy) if a; = 1, 
otherwise. 

Let us introduce a square reqular grid on [0, 1]^ of mesh size h — l/{n+l) 



and discretize equation (20) on this grid by Ql finite elements. Then equation 



(20) reduces to a matrix equation with matrix K in (18). This matrix, as it 
was noted before, is 2-level quasiseparable and, therefore, we can apply fast 
LDU decomposition Algorithm [T] with fast order-reduction Algorithm [2] at 
each step. Maximal quasiseparable order can be chosen adaptively in SVD 
but we set it manually to 4 and 8 for simplicity. Larger order corresponds 
to better approximation and, hence, more accurate result although it makes 
algorithm slower. Results of the computational tests for different mesh sizes 
are presented in Tables [3] and |4] Note, that our solver was implemented for 
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unsymmetric problems and did not use symmetry of the matrix in the current 
example. Exploiting the symmetry would roughly halve the time and memory 
used. 

Table 3 Time and memory used by direct quasiseparable LDU decomposition based solver 
applied to the problem in Example [s] quasiseparable order (maximal off-diagonal rank) is 
set to 4 during computations. 



n LDU mem. Mb solve H^""-^"'^ 

1I7II2 



212 


0.11 


3 


0.003 


8.22e-05 


2i4 


0.51 


12 


0.011 


1.85e-04 


216 


2.18 


51 


0.043 


3.93e-04 


218 


9.00 


210 


0.169 


6.91e-04 


220 


36.88 


847 


0.677 


8.81e-04 



Table 4 Time and memory used by direct quasiseparable LDU decomposition based solver 
applied to the problem in Example |3] quasiseparable order (maximal off-diagonal rank) is 
set to 8 during computations. 



n 


LDU 


mem, Mb 


solve 


\\Au-f\\2 
II/II2 


212 


0.19 


4 


0.003 


3.31e-09 


214 


0.91 


19 


0.013 


6.19e-08 


2I6 


4.09 


83 


0.052 


5.72e-07 


2I8 


17.44 


342 


0.209 


2.33e-06 


220 


72.83 


1388 


0.852 


5.41e-06 



The analysis of results collected in Tables [3] and [4] reveals the linear depen- 
dence of time and memory used by the algorithm on the size of the problem. 
The computational experience supports our claim about asymptotically linear 
complexity of algorithms with multilevel quasiseparable matrices. Note also the 
linear dependence of time and the sub-linear dependence of memory on the 
maximal quasiseparable order used during computations. 

If we truncate the order of quasiseparable matrices in the quasiseparable 
LDU decomposition to a very small number, the decomposition becomes less 
accurate. However, the algorithm becomes much faster and it is tempting to 
try this approximate LDU decomposition as a preconditioner in the precon- 
ditioned conjugate gradient method. Table [5] bellow illustrates the numerical 
performance of this approach. Results indicate that the PCG method precon- 
ditioned with the inexact LDU decomposition is twice as fast as more accurate 
LDU decomposition alone. It is enough to truncate quasiseparable order to 2-4 
to force PCG to converge in less than 10 iterations. 

Solution of problem in Example [3] obtained with PCG method with quasi- 
separable preconditioner is given in Figure [2] below. 
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Table 5 Time and number of iterations (in brackets) used by PCG when applied to the 
problem in Example[3] preconditioner is based on approximate quasiseparablc LDU decom- 
position with order tolerance r, system is solved to the l.Oe-08 accuracy. 



n 


r 


T rM T 

ijJJ u 




total time 


2i2 


1 


0.05 


0.0359 (9) 


0.09 




2 


0.07 


0.0252 (6) 


0.09 


214 


1 


0.21 


0.212 (14) 


0.42 




2 


0.30 


0.144 (9) 


0.45 


2i6 


3 


1.76 


0.482 (7) 


2.24 




4 


2.18 


0.282 (4) 


2.47 


2i8 


3 


7.17 


2.96 (11) 


10.13 




4 


9.05 


1.98 (7) 


11.03 


220 


4 


37.07 


10.1 (9) 


47.22 




5 


45.63 


8.19 (7) 


53.83 




Fig. 2 Solution of problem in Example |3| 

Example 4 Let i7 — [0, 1]^ and consider the problem 

s.t. -V^M = /, in f2, 
u = u\dn, on df2, 



where 



l)2(2y-l)2 if (x,y)G [0,i]2, 
otherwise. 
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As it was already discussed in Section [2] problem (21 1 can be solved by the 
discretization by Ql finite elements. Discretized system of equation, in this 
case, becomes 



2/3M -M 

M 
-M K 



f 







u 




b 


A 




d 



(22) 



After we eliminate variables f and u in equation (22 1 it reduces to 



2/? 



(23) 



This system is sometimes called normal equation in optimization community. 



Computing matrix on the left side of ( 23 ) is prohibitively expensive as it is 



usually very large and dense. Even more prohibitively expensive is solving the 
system with this matrix as it is 0{n^) algorithm. One common approach is 
to drop either or KM^^K^ in the equation and to solve system with 

the remaining matrix, thus, using it as a preconditioner (see, for instance, 
[21j). However, this preconditioner would perform badly if none of the terms 
is dominant. We propose a completely different approach. Matrices M and 
K in (|23]) are multilevel quasiseparable (see Examples [I] and [2]) . Thus, we 
can compute the Schur complement matrix in the left hand side of ( 23 1 in 



quasiseparable arithmetic using 0{n) arithmetic operations if quasiseparable 
orders stay small during computations. The last condition is true in practice 
because Schur complement is itself a discretized elliptic operator similar to 
Laplacian. In the previous example we have shown that it is more effective 
to use the proposed approach as a preconditioner rather than a direct solver. 



Thus, our approach to solving equation ( 22 ) consists of the following steps 



1. Invert matrix M and form matrix S — -^M + K M^^ K'^ in quasiseparable 
matrices arithmetic. 

2. Compute an approximate LDU decomposition of S using Algorithm [l] and 
some order reduction tolerance. 

3. Use PCG method to solve S\ — y with approximate LDU decomposition 
as a preconditioner. 

4. Exploit computed A, A/^^ and the factorization (|6| to find f and u. 

We have realized the proposed approach in practice. Tables [6] and [7| gather 
the numerical results we have obtained while solving the problem in Example 
11 

Analyzing the results presented in Tables [6] and [7] we conclude that our pre- 
conditioner is mesh-independent and, as in the case of simple PDE problem, 
CPU time for the overall algorithm grows linearly with the size of the prob- 
lem. Control function / and obtained solution u computed with the proposed 
algorithm for different values of the regularization parameter /? are presented 
in Figure [3) 
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Table 6 Time required to construct the normal equation matrix 5 in the left hand side of 
l |23| with /3 = IQ— 2, compute its approximate LDU decomposition and solve system with 
it to a specified tolerance using PCG method (number of iterations is given in brackets). 
Quasiseparable order used in order reduction algorithm was set to 1. 



/3 = 


10-2, tol 


= 10-* 






n 


S 


LDUCS) 


PCG 


total time 


12 


0.73 


1.12 


0.0839 (3) 


1.93 


14 


3.23 


5.01 


0.337 (3) 


8.57 


16 


13.57 


21.12 


1.37 (3) 


36.05 


18 


55.75 


86.85 


3.91 (2) 


146.51 


P = 


10-2, tol 


= 10-8 






n 


S 


LDU{5) 


PCG 


total time 


12 


0.71 


1.11 


0.13 (5) 


1.95 


14 


3.23 


5.00 


0.534 (5) 


8.76 


16 


13.58 


21.11 


2.17 (5) 


36.86 


18 


55.79 


86.83 


8.76 (5) 


151.38 



Table 7 Time required to construct the normal equation matrix 5 in the left hand side of 
l |23| with /3 = 10—^, compute its approximate LDU decomposition and solve system with 
it to a specified tolerance using PCG method (number of iterations is given in brackets). 
Quasiseparable order used in order reduction algorithm was set to 1. 



/3 = 


lO-'^, tol 


= 10-* 






n 


S 


LDU(S') 


PCG 


total time 


12 


0.71 


1.12 


0.0344 (1) 


1.86 


14 


3.23 


4.98 


0.14 (1) 


8.35 


16 


13.57 


21.05 


0.566 (1) 


35.19 


18 


55.83 


86.58 


2.28 (1) 


144.70 


/3 = 


lO-'^, tol 


= 10-8 






n 


S 


LDU(S') 


PCG 


total time 


12 


0.71 


1.11 


0.058 (2) 


1.88 


14 


3.23 


4.98 


0.238 (2) 


8.45 


16 


13.57 


21.05 


0.966 (2) 


35.59 


18 


55.79 


86.71 


3.9 (2) 


146.40 



6 Conclusion. 

In this paper we have introduced a new class of rank-structured matrices called 
multilevel quasiseparable. This matrices are low-parametric in the sense that 
only 0{n) parameters are needed to store a matrix. Moreover, arithmetic op- 
erations and matrix decompositions can be performed in 0{n) floating-point 
operations. Multilevel quasiseparable matrices extend the applicability of well- 
known quasiseparable matrices |H] from one-dimensional to multidimensional 
problems. In particular, we have shown that multilevel quasiseparable struc- 
ture is well-suited to the description of discretized elliptic operators in 2D. 
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Fig. 3 Plots of the state, u, and the control, /, for f3 = 10^ and 10 




To demonstrate the usefulness of the new class of matrices we considered 
distributed control problem with a constraint in the form of a partial differen- 
tial equation. Such problems were introduced by Lions and Mitter in [18, . In 
the course of solving them large-scale block matrices of saddle-point type arise. 
A straightforward way of solving these systems is to use block LU factoriza- 
tion. This is impractical as it requires direct inversion of large PDE matrices 
and further manipulations with them. However, we have shown that inverses 
of PDE matrices can be approximated by multilevel quasiseparable matrices 
with any desired accuracy and, hence, what was impossible in dense linear 
algebra became possible in structured linear algebra. 

Development of numerical methods for solving systems of saddle-point type 
is an important subject of modern numerical linear algebra, see an excellent 
survey paper by Benzi, Golub, and Liesen [3] for details. A large amount of 
work has been done on developing efficient preconditioners for such systems. In 
the current paper we have also proposed a new efficient mesh-independent pre- 
conditioner for saddle-point systems arising in PDE-constrained optimization. 
Performance of the new preconditioner is studied numerically. 

There are several open questions left for future research. In particular, it is 
not clear what the range of applicability of multilevel quasiseparable matrices 
is. We have only considered problems with symmetric differential operators in 
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our examples. However, theory does not require symmetry in the matrix and 
it may very weh be that multilevel quasiseparable matrices are applicable to 
convection-diffusion and other unsymmetric operators as well. Next, robust 
techniques are needed to make the approach work on non-tensor grids. The 
biggest question though is how to extend order reduction Algorithm [2] to the 
block case to be able to use multilevel quasiseparable matrices in 3D and higher 
dimensions. 
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